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This  study  was  conducted  under  Contract  No.  AF  19(628)— 1610  at 
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Hill,  North  Carolina.  Dr  Albert  Amon  served  as  principal  investigator 
and  Dr  Anne  Story,  as  contract  monitor. 

Work  was  performed  under  Project  4690  "Information  Processing  in 
Command  and  Control",  Task  469003,  "Human  Information  Processing 
Techniques". 


ABSTRACT 


This  paper  presents  an  optimal  strategy  for  sequential 
sampling  from  binomial  distributions.  The  strategy  presented 
is  general  in  that  it  is  a  "multi-action"  rather  than  a  two- 
action  procedure.  While  the  major  task  is  to  estimate  the  pro¬ 
portion,  p,  of  "successes"  in  a  hypothetical,  infinite  popula¬ 
tion  of  binary  observations,  it  is  assumed  that  the  decision 
maker  is  only  concerned  with  which  of  a  set  of  mutually  exclusive 
and  exhaustive  subsets  of  the  unit  interval  contains  p.  The 
derived  strategy  maximizes  the  decision-maker's  gain  without 
regard  to  error  probabilities. 

The  important  variable  in  determining  a  rule  for  ceasing  to 
look  at  new  data  and  making  a  decision  is  found  to  be  the  expec¬ 
ted  probability  of  being  correct.  The  criterion  involves  only 
the  economic  aspects  of  the  situation.  A  "no  information"  theorem 
is  presented  which  shows  that  under  some  circumstances  when  a 
"success"  or  a  "failure"  on  a  given  trial  are  equally  probable, 
the  probability  of  being  correct  after  making  the  observation  is 
identical  to  the  probability  of  being  correct  before  the  observa¬ 
tion  was  taken.  Finally,  an  appealing  derivation  of  the  Beta- 
binomial  probability  function  is  given  which  suggests  a  more 
tractable  computational  procedure  for  the  distribution  and  which 
illuminates  its  limiting  distribution. 
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SEQUENTIAL  INFORMATION  SEEKING:  AN  OPTIMAL  STRATEGY  AND  OTHER  RESULTS 


I.  Introduction, 

It  is  not  uncommon  for  a  person  to  be  faced  with  the  task  of  having  to 
evaluate  a  body  of  information  and  to  make  a  decision  on  the  basis  of  his  evalu¬ 
ation.  A  problem  which  naturally  arises  is  how  long  to  spend  collecting  or 
evaluating  data  before  making  the  decision.  This  issue  is  particularly  perti¬ 
nent  when  data  or  information  can  be  obtained  by  the  decision  maker  but  only  at  a 
cost.  The  decision  maker  must  determine  at  what  point  he  will  cease  looking  at 
new  data  and  make  his  decision.  When  a  decision  maker  can  at  any  time  choose  to 
observe  more  data  or  choose  to  make  a  decision,  we  say  that  the  sampling  pro¬ 
cedure  is  sequential. 

In  the  quest  of  psychologists  to  understand  human  behavior  in  general  and 
human  decision  making  in  particular,  the  study  of  human  information  seeking  is 
of  prominent  importance.  The  empirical  research  of  Irwin  and  Smith  (1956,  1957); 
Pruitt  (1961);  Lanzetta  and  Kanareff  (1962);  Edwards  (1964);  and  Messick  (1964a); 
has  contributed  to  understanding  of  the  problem. 

Psychologists  themselves  are  engaged  in  an  almost  continual  process  of  col¬ 
lecting  and  evaluating  data  from  empirical  research.  Wald  (1947)  has  shown  that 
sequential  experiments  are  often  more  efficient  in  minimizing  required  sample 
size  than  experiments  the  size  of  which  are  predetermined.  Fiske  and  Jones  (1954) 
have  emphasized  this  argument  for  psychologists,  who  as  scientists,  may  benefit 
from  sequential  sampling  procedures  as  research  tools.  For  such  purposes  it  is 
desired  to  find  procedures  which  have  specific  properties,  e.g. ,  to  minimize  ex¬ 
pected  sample  sizes  holding  error  probabilities  constant  (Wald;  1947),  to  maxi¬ 
mize  expected  utilities  holding  error  probabilities  constant  (Edwards;  1964),  or  to 
reduce  uncertainty  to  a  preassigned  level  (DeGroot,  1962;  Lindley,  1956,  1957. 

While  these  two  sources  of  interest  in  sequential  sampling  and  information 
processing  are  distinct,  the  modern  approach  to  the  study  of  the  "rational  man" 
(called  the  "ideal  observer"  in  contemporary  psychophysics)  is  tending  to  bring  the 
two  interests  together.  Becker  (1958),  for  example,  studied  human  sequential 
sampling  from  the  theoretical  point  of  view  of  Wald.  Edwards  (1964)  is  studying 
the  same  behavior  from  a  Bayesian  position. 

The  primary  purpose  of  the  present  report  is  to  derive  an  optimal  strategy  for 
a  special,  but  not  uncommon  situation,  that  of  sequential  sampling  from  binomial 
distributions.  The  strategy  presented  here  is  general  in  that  it  is  a  "multi¬ 
action"  rather  than  a  two-action  procedure.  Furthermore,  while  the  major  task  is 
to  estimate  the  proportion,  p,  of  "successes"  in  an  hypothetical,  infinite  popu¬ 
lation  of  binary  observations,  it  is  assumed  that  the  decision  maker  is  only  con¬ 
cerned  with  which  of  a  set  of  mutually  exclusive  and  exhaustive  subsets  of  the 
unit  interval  contains  p.  The  derived  strategy  maximizes  the  decision-maker's 
gain  without  regard  to  error  probabilities. 

The  results  of  this  report  have  potential  value  as  a  general  Bayesian  research 
strategy.  The  immediate  need  to  be  filled  by  them  is  to  provide  a  formal,  rational 
model  of  information-seeking  which  can  be  used  to  evaluate  the  "optimality"  of 
actual  human  information  seeking  behavior  (Messick;  1964a,  1964b). 
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II.  An  optimal  Strategy  for  Information  seeking. 

In  deriving  optimal  strategies  for  information  seeking,  attention  will  be 
restricted  to  the  case  in  which  the  prior  probability  distribution  is  of  the  Beta 
family,  with  density  given  by 

(1)  f(p|o,6)  -  B(a,8)"1p°"1(l-p)B_1,  a,  B  >  0, 

and  in  which  the  data  generating  process  is  binomial.  The  restriction  of  the  prior 
distribution  to  the  Beta  family  has  been  adequately  defended  by  Rapoport  (1964;  and 
Lindley  (1957).  The  task  of  the  decision  maker  is  to  select  one  of  s^  mutually  ex¬ 
clusive  and  exhaustive  subsets  of  the  unit  interval  which  he  believes  contains  p, 
the  true  proportion.  We  let  this  decision  maker  be  motivated  solely  by  the  desire 
to  maximize  his  expected  gain  and  we  assume  that  he  uses  Bayes  formula  to  combine 
prior  opinion  with  current  information  to  produce  a  posteriori  opinions.  Thus, 
given  a  particular  prior  distribution  with  parameters  (a,B),  and  given  that  our 
decision  maker  has  observed  r  "successes”  in  n  Bernoulli  trials,  then  the  state  of 
his  knowledge  concerning  p  will  be  represented  by  a  Beta  distribution  with  parame¬ 
ters  (a+r,  B+n-r). 

We  will  assume  that  the  economic  aspects  of  the  situation  have  the  following 
form:  if  the  terminal  decision  is  correct  (i.e.,  if  p  is  in  the  selected  subset) 
the  decision  maker  is  given  A  dollars;  if  the  terminal  decision  is  incorrect  he 
is  fined  L  dollars;  and  each  observation  made  costs  C  dollars.  The  expected  gain 
of  making  a  terminal  decision  after  having  observed  r  successes  in  n  trials  is 

(2)  E(g|a+r,  6+n-r)  -  AP*(a+r,  B+n-r)  +  L(l-P*(a+r,  B+n-r))  +  nc 

*  (A-L)P*(a+r,  B+n-r)  -  L  +  nc, 
where  c  is  assumed  negative,  L  <  A,  and  where 

(3)  P*(a+r,  B+n-r)  *  max  /  f(p|a+r,  B+n-r)  dp;  i  «  1,  2,  ...,  s. 

1 

P*(a+r,  B+n-r)  is  the  maximum  probability  of  being  correct,  where  maximization  is 
with  respect  to  the  set  of  terminal  actions  or  decision  categories,  { 1^ }  . 

To  avoid  notational  difficulties  let  a  -  a+r,  B„  *  B+n-r,  y  -  a  +B„  * 

n  *  n  *  n  n  n 

a+B+n,  and  *  a  /y  .  The  question  which  must  now  be  answered  is  what  is  the 

9  n  n  n 

expected  gain  of  taking  another  k  observations  and  then  making  a  terminal  decision, 
k  ■  1,2,  ...  .  Denote  this  expected  gain  Elc(g|an,  Bn).  It  will  be  shown  that 

k 

(4)  Ek(e|an.  6n)  -  (A-L)  l  Pr(t|k,  an,  Bn>  P«(an+t,  8n+k-t)  +  L  +  c(n+k)  . 

fc*0 


In  this  formula  Pr(t|k,  <»n,  Bn)  gives  the  probability  of  obtaining  t  successes  in 
the  k  trials  when  the  prior  Beta  distribution  has  parameters  («n,  Bn).  This  proba¬ 
bility  is  given  by  the  Beta-binomial  probability  function  (see  Raiffa  and  Schlaifer, 
1961,  237)  defined  by  . 


an>  8n>  "  {  b(fclk*P)  f(p|«n»  Bn>  dP 

0  (t+an-l)l  (6n+k-t-l)l  k!  (yn-l)! 
t !  (k-t)  !  (<*n-l)!  (0n-D!  (Tn+k-l)l 


(5) 


Pr(t |k, 


(An  intuitively  appealing  deviation  of  this  probability  function  is  given  in  Section 
IV.)  Given  that  t  successes  have  occurred  in  the  k  trials  the  expected  gain  is 
given  by  (2)  and  is  found  to  be 

(6)  E(g|an+t,  Bn+k-t)  =  (A-L)  P*(an+t,  Bn+k-t)  +  L  +  c(n+k). 

The  expression  in  (4)  results  from  taking  the  expectation  of  (6)  with  respect  to  t. 

In  order  to  take  a  sample  size  that  will  maximize  the  expected  gain,  an  addi¬ 
tional  observation  is  made  if  and  only  if  there  exists  a  k  such  that  the  difference 

(7)  Ak(an»  8n}  “  Ek(elan»  Bn*  "  E(slan»  6n>  >  0  »  k  *  1»  2.  3,  ...  . 

Letting  k 

(S)  nk(an,  6n)  -  Pr(t |k,  an,  B„)  P»(an+t,  Bn+k-t)  -  P*(an,  Bn), 

then 

(9)  Ak(an,  6n)  -  (A-L)nk(an,  Bn>  +  ck. 

Therefore  a  stopping  rule  equivalent  to  (7)  is  to  take  another  observation  if 
and  only  if  for  some  k 

(10)  nk(an»  Bn)  ”  -  *  k  -  1.  2,  3,  ...  . 

Several  interesting  and  useful  properties  of  this  sampling  rule  may  be  gleaned 
from  (10).  First,  since  11^(0^,  6n)  is  the  difference  between  two  probabilities  it 

can  range  only  between  1  and  -1.  Only  positive  values  of  nk  would  lead  to  addi¬ 
tional  sampling  since  the  term  on  the  right  of  the  inequality  will  always  be  posi¬ 
tive  under  our  restrictions.  Furthermore  it  is  obvious  that  values  of  k  greater 

than  “  need  not  be  considered  since 

k  >  -  A-L  -  ck  >  1. 

”  c  A-L 

Since  is  bounded  above  by  1,  (10)  will  never  hold  for  k  ^  . 

Feasible  values  of  k  can  be  further  restricted  by  noting  that  P*(<*n,Bn)  >  0. 

If  we  let  1-P* ( a  ,  6  )  =  e  ,  then  we  note  that  (10)  can  be  true  only  for  values  of 
k  such  that  n  n  n 

(11)  k  <  -  En  (A~L)  . 

c 

Thus  only  values  of  k,  k  *  1,  2,  3»  •  •  • »  ~en  ^ need  be  tested. 

c 

Finally,  as  n  increases  without  bound  p*(<*n»  0n)  approaches  unity  as  a  result 
of  the  fact  that  f(plan»^n)  approaches  a  point.  Therefore,  en  goes  to  zero  and 

(12)  11m  £n  =  0. 

n  -  -  c 

Thus 

(13)  lim  Pr(sampling  stops  after  n  observations)  ■  1. 
n  ■+  • 
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Exceptions  to  (13)  occur  only  if  (A-L)  is  infinitely  large  or  if  c  «  0,  These 
exceptions  make  good  intuitive  sense.  If  A  is  infinitely  large  or  L  infinitely 
negative,  or  if  observations  are  free,  then  sampling  might  well  be  expected  to 
continue  for  an  indefinite  period. 

It  will  be  of  practical  value  to  develop  approximations  to  the  rule  given  by 
(10).  Two  types  of  approximation  will  be  suggested  but  research  is  needed  to 
determine  how  good  the  approximations  are.  The  difficulty  with  (10)  is  that  in 
some  cases  it  may  be  necessary  to  test  Jl.  for  many  values  of  k  before  a  decision 
can  be  made  as  to  whether  to  take  another  observation.  For  example,  in  an  experi¬ 
ment  performed  by  Messick  (1964)),  A  »  15,  L  *  0,  and  c  «  .2.  The  maximum  k  is 
thus,  A-L  *  75.  Clearly  the  procedure  developed  here  would  not  be  feasible  if 

had  to  Se  tested  for  all  values  of  k  between  1  and  75.  One  way  in  which  this 
difficulty  could  be  overcome  is  to  test  for  k  *  1,  2,  3,  . . . ,  u,  where  u  is  pre¬ 
determined  by  the  experimenter  on  the  basis  of  the  degree  of  accuracy  desired.  An 
alternative  procedure  for  approximating  (10)  would  involve  selecting  more  or  less 
arbitrary  values  of  k,  perhaps  by  some  random  procedure.  For  example,  one  might 
test  for  k  *  1,  2,  6,  16,  23.  This  latter  procedure  would  be  free  of  any  bias  which 
might  be  involved  in  testing  only  small  values  of  k,  but  it  would  be  more  time 
consuming  computationally. 

III.  A  "No  Information”  theorem. 


Let  I^1  be  a  subset^of  [0-1]  such  that  I^1  =  [x,  l-x3*  and  such  that 

(14)  P* ( a , 0 )  »  I  f (p | a ,  3)  dp,  when  a  «  0;  and 

1  r1"1 

(15)  P*(a+1,  6)  =  J  f(p|a+l,6)  dp,  and 

\l-X 

(16)  P*(  a,  6+1)  -  |  f(p|a,  6+1)  dp. 

X 

Then, 

(17)  P*(a,6)  -  P*( a+1 ,  6)  -  P*(a,  6+1)  . 

The  equivalence  of  P*(a+1,  6)  and  P»(a,  6+1)  follows  from: 

fx  f1-* 

(18)  J  f(p|a,e)dp  -  1  -  I  f (p | 6  ,a)  dp. 

0  Q 

The  second  equality  in  (17)  will  be  assumed  and  a  complete  proof  of  the  first  will 
be  given. 

Writing  (14)  in  full  we  have  x 

r(2a)  (  ~  _  , 

P«(a,6)  -  P*(a,6)  -  rTaT7  J  pa-1(l-p)0''1dp. 

X 

However,  since  the  Beta  distribution  is  symmetric  when  a=6,  this  can  be  written 
as 

x 

(19)  P«(a,6)  -  1-2  r(2a)  [  pa_1  (1-p)®"1  dp  . 
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Writing  (15)  in  full  we  have 


(20)  P«(a+1,8) 


T (2a+l) 

r'(a+T)TTa) 


fi-x 

J  Pa(l-p)°-1dp 

X 


-  r(2a+l) 

r(a+l)rra-y 


fl-x 

J  pa(l-p)a_1dp 

0 


r  (2o-n)  J  pa(i-p)a-1  dp 
r(a+i)r(o)  o 


-  1 


r (2a+l) 

r(«)r(a+l) 


pa“1(l-p)a  dp 


r ( 2a+l)  f 
r(a+l)r(a)J 


pa(l-p)°“1dp 


To  prove  (17)  we  show  that  (19)  minus  (20)  is  identically  zero.  First  notice  that 


r(2a+l)  _  2ar(2a)  _  n  r(2a) 

TToTJlTa+TT  ar('a )I"  2  TJJ77 


Subtracting  (20)  from  (19)  we  have 

x 

(21)  P«(o,6)  -  P«(a+1,8)  -  -  j  [pa_1(l-p)a"1  -  pa"1(l-p)°-pa(l-p)a"1]dp 


-  -  2 


fflyi  f  (p^d-pJ^Jd-d-p)  -  p)  dp 


=  0. 


This  theorem  has  the  following  interpretation*  If  a*8,  then  the  mean  of  the 
prior  distribution  is  1/2.  This  value  may  be  taken  as  the  expected  probability  of 
a  success  on  the  next  trial.  If  one*s  best  terminal  act  at  this  point  is  to 
select  the  decision  interval  centered  at  1/2,  and  if  this  same  terminal  act  remains 
optimal  after  another  observation  is  taken,  regardless  of  the  outcome  of  the  ob¬ 
servation,  then  the  probability  of  being  correct  before  taking  the  observation  is 
identical  to  the  probability  of  being  correct  if  the  terminal  act  is  selected  after 
the  one  observation  is  made.  From  this  point  of  view,  the  observation  is  non- 
inf  ormative. 

IV.  The  Beta-binomial  distribution:  an  intuitive  derivation. 


The  binomial  distribution, 

(22)  b(r |n,p)  -  (")pr(l-p)n-r  , 

gives  the  probability  of  obtaining  r  "successes1*  in  n  independent  Bernoulli  trials 
when  the  probability  of  one  success  is  p,  0<  p<l  .  In  this  case,  p  is  constant. 
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The  Beta-binomial  probability  distribution, 

1 

(23)  Pr(r|n,q,8)  =  j  b(r |n,p)f (p | a, 6)  dp 

*  (r+a-1)  1  (8+n-r-l)l  nl  (q+8-l)l 

r ! (n-r) 1  (a-1)!  (8-1)1  (q+8+n-l)! 

gives  the  probability  of  obtaining  r  successes  in  n  trials  when  p  is  unknown, 
but  rather  treated  as  a  random  variable  having  a  distribution  of  the  Beta  family 
with  parameters  (a, 8). 

To  derive  the  Beta-binomial  distribution,  use  will  be  made  of  the  fact  that 
the  first  moment  of  the  Beta-distribution,  which  may  be  interpreted  as  the  expected 
probability  of  a  success,  is  given  by 


(24) 


6 


Q 

q+8 


and  therefore  the  expected  probability  of  a  "non-success"  is 


(25)  1-6 


8 

a  4*8 


If  n  observations  are  taken,  r  of  which  are  successes,  then  the  posterior  Beta 
distribution  has  parameters  (q+r,  8+n-r).  The  mean  of  this  posterior  distribution 
is 


(26) 


q+r 

q+8+n 


and 


(27)  l-6n 


6+n-r 

q+6+n. 


We  will  only  be  concerned  with  the  special  case  in  which  n  *  1.  Given  (q,8)  we  wish 
to  find  the  probability  of  obtaining  r  successes  in  n  trials. 

To  clarify  the  argument,  consider  the  grid  in  Figure  1.  A  person  begins  at 
the  origin  with  coordinates  (8,q).  If  a  success  occurs  our  decision  maker  moves 
up  one  step.  If  a  non-success  occurs,  he  moves  to  the  right  one  step.  From  any 
point  on  the  grid,  the  probability  of  moving  up  a  step  is  the  ratio  of  the  ordinate 
of  the  point  to  the  sum  of  the  coordinates  and  the  probability  of  going  to  the 
right  is  the  ratio  of  the  abcissa  to  the  sum  of  the  coordinates. 

We  first  note  that  the  sum  of  the  coordinates  of  any  point  is  q+8+n,  where  n 
is  the  number  of  steps  (trials)  required  to  reach  the  point  from  the  origin. 

Second,  the  numerator  of  the  probability  of  moving  one  step  to  the  right  does  not 
depend  on  the  ordinate  and  the  numerator  of  the  probability  of  moving  one  step  up 
does  not  depend  on  the  abcissa.  Thus  the  numerator  of  the  probability  of  moving 
from  (8,*) to  (8+1, q)  is  6,  which  is  the  same  as  the  numberator  of  the  probability 
of  moving  from  (8,  q+4)  to  (8+1,  q+4).  Finally  we  note  that  the  probability  of 
going  to  any  specified  point  in  the  grid  from  the  origin  via  a  specified  path  is 
the  product  of  the  probabilities  of  each  of  the  component  moves  In  the  path 7 
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Figure  1:  Grid  representing  sequential  sampling  procedure.  The  origin  is 
(6, a). 
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The  major  result  to  establish  Is  that  all  paths  to  a  point  from  the  origin 
have  the  same  probability.  To  prove  this,  i'irst  consider  the  numerator  to  tne 
probability  of  moving  from  (p,a)  to  (p+s,  a+r).  By  the  argument  given  in  the  above 
paragraph,  the  numerator  of  the  probability  of  going  s  steps  to  the  right  and  r 
steps  up  is  given  by  b(B+1)  ...  (0+s-l)a  (a+1)  ...  (a+r-1)  regardless  of  the  order 
in  which  the  steps  occur.  This  number  may  be  written  as  (g+s-1) 1  (a+r-1) 1  . 

(8-1)!  ( a-1)  I 

Uje  denominator  of  the  probability  of  moving  in  either  direction  from  a  point 
is  the  sum  of  the  coordinates  of  the  point,  and,  as  stated  above,  this  number 
depends  only  on  the  number  of  steps  required  to  reach  the  point  from  the  origin. 
Since  the  denominator  of  the  product  equals  the  product  of  the  denominators  of  the 
step  by  step  probabilities,  it  will  be  equal  to  (a+e)  (a+8+1)  ...  (a+8+s+r-l), 
which  can  be  written  (a+B+s+r-1) I  .  Therefore,  the  probability  of  going  from 

(a+8— 1)! 

(6, a)  to  (B+s,  a+r)  by  any  specified  path  to 


(28)  Pr[any  single  path  from  (6, a)  to  (B+s,  a+r)]  * 


( B+s-l) 1  (a+r-1) I  (a+B-1) ! 

(B-l)l  (a-1)!  (a+B+s+r-1)! 


Finally,  the  number  of  paths  from  (B,a)  to  (B+s,  a+r)  is 

(a+r)  -  (*lc) 


fs+r\  _  /S+r\  _  (s+r)  I 


r!  s ! 


and  the  probability  of  going  from  (8, a)  to  (B+s,  a+r)  regardless  of  the  path  is 
Pr(r |r+s ,a,B) 


(s+r)!  (B+s-l)! 

“rfs!  (B-l'n 


(a+r-1) I 

(a-ni 


(a+B-1) 1 

(a+B+s+r-1) ! 


Setting  n  *  s+r  and  s  «  n-r,  the  above  expression  is  the  Beta-binomial  distribution 
given  in  (23). 

This  derivation  of  the  Beta-binomial  distribution  is  of  practical  value  for  two 
reasons.  First,  it  provides  a  simpler  computational  procedure  for  finding  proba¬ 
bilities  than  that  in  (23),  which  entails  evaluating  the  factorials.  Second,  it 
becomes  apparent  from  this  point  of  view  that  the  limiting  distribution  of  the 
Beta-binomial  is  the  binomial.  As  a, 8  increase,  the  ratio  a  approaches  p  (see 

Raiffa  and  Schlaiffer,  1961).  Therefore  0+0 

(29)  Lim  a(a+l)  • •  »,,  {°‘+rr.1.?.  ,  P/i.+P  •  ♦  (B+s-l)  _  pr(i_p)s  «  pr(l_p)n_r  . 

a,B-*-»  (a+0)  (a+0+1)  ...  (a+B+s+r-1) 


Summary 

In  this  report  an  optimal  strategy  is  presented  for  sequential  information 
seeking.  The  important  variable  in  determining  a  stopping  rule  is  found  to  be  the 
expected  probability  of  being  correct.  The  criterion  involves  only  the  economic 
aspects  of  the  situation.  A  "no  information"  theorem  is  presented  which  shows 
that  under  some  circumstances  when  a  "success"  or  a  "failure"  on  a  given  trial  are 
equally  probable,  the  probability  of  being  correct  after  making  the  observation 
is  identical  to  the  probability  of  being  correct  before  the  observation  was  taken. 
Finally,  an  appealing  derivation  of  the  Beta-binomial  probability  function  was 
given  which  suggests  a  more  tractable  computational  procedure  for  the  distribution 
and  which  illuminates  its  limiting  distribution. 
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